probability estimation
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Generalizing Tree Probability Estimation via Bayesian Networks
Probability estimation is one of the fundamental tasks in statistics and machine learning. However, standard methods for probability estimation on discrete objects do not handle object structure in a satisfactory manner. In this paper, we derive a general Bayesian network formulation for probability estimation on leaf-labeled trees that enables flexible approximations which can generalize beyond observations. We show that efficient algorithms for learning Bayesian networks can be easily extended to probability estimation on this challenging structured space. Experiments on both synthetic and real data show that our methods greatly outperform the current practice of using the empirical distribution, as well as a previous effort for probability estimation on trees.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation
We provide a detailed study of the estimation of probability distributions---discrete and continuous---in a stringent setting in which data is kept private even from the statistician. We give sharp minimax rates of convergence for estimation in these locally private settings, exhibiting fundamental tradeoffs between privacy and convergence rate, as well as providing tools to allow movement along the privacy-statistical efficiency continuum. One of the consequences of our results is that Warner's classical work on randomized response is an optimal way to perform survey sampling while maintaining privacy of the respondents.
When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values
Muller, Christophe, Scornet, Erwan, Josse, Julie
Predicting a response with partially missing inputs remains a challenging task even in parametric models, since parameter estimation in itself is not sufficient to predict on partially observed inputs. Several works study prediction in linear models. In this paper, we focus on logistic models, which present their own difficulties. From a theoretical perspective, we prove that a Pattern-by-Pattern strategy (PbP), which learns one logistic model per missingness pattern, accurately approximates Bayes probabilities in various missing data scenarios (MCAR, MAR and MNAR). Empirically, we thoroughly compare various methods (constant and iterative imputations, complete case analysis, PbP, and an EM algorithm) across classification, probability estimation, calibration, and parameter inference. Our analysis provides a comprehensive view on the logistic regression with missing values. It reveals that mean imputation can be used as baseline for low sample sizes, and improved performance is obtained via nonlinear multiple iterative imputation techniques with the labels ( MICE.RF.Y). For large sample sizes, PbP is the best method for Gaussian mixtures, and we recommend MICE.RF.Y in presence of nonlinear features.
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.50)
- Research Report > Experimental Study (0.36)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Fast and Accurate Collision Probability Estimation for Autonomous Vehicles using Adaptive Sigma-Point Sampling
Cossette, Charles Champagne, Clawson, Taylor Scott, Feit, Andrew
A novel algorithm is presented for the estimation of collision probabilities between dynamic objects with uncertain trajectories, where the trajectories are given as a sequence of poses with Gaussian distributions. W e propose an adaptive sigma-point sampling scheme, which ultimately produces a fast, simple algorithm capable of estimating the collision probability with a median error of 3.5%, and a median runtime of 0.21ms, when measured on an Intel Xeon Gold 6226R Processor . Importantly, the algorithm explicitly accounts for the collision probability's temporal dependence, which is often neglected in prior work and otherwise leads to an overestimation of the collision probability. Finally, the method is tested on a diverse set of relevant real-world scenarios, consisting of 400 6-second snippets of autonomous vehicle logs, where the accuracy and latency is rigorously evaluated.
- North America > United States > California > San Mateo County > Foster City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Wang, Liaoyaqi, Jiang, Zhengping, Liu, Anqi, Van Durme, Benjamin
We present a state-of-the-art model for fine-grained probability estimation of propositions conditioned on context. Recent advances in large language models (LLMs) have significantly enhanced their reasoning capabilities, particularly on well-defined tasks with complete information. However, LLMs continue to struggle with making accurate and well-calibrated probabilistic predictions under uncertainty or partial information. While incorporating uncertainty into model predictions often boosts performance, obtaining reliable estimates of that uncertainty remains understudied. In particular, LLM probability estimates tend to be coarse and biased towards more frequent numbers. Through a combination of human and synthetic data creation and assessment, scaling to larger models, and better supervision, we propose a set of strong and precise probability estimation models. We conduct systematic evaluations across tasks that rely on conditional probability estimation and show that our approach consistently outperforms existing fine-tuned and prompting-based methods by a large margin.
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Middle East > Jordan (0.04)
- (13 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Leveraging Log Probabilities in Language Models to Forecast Future Events
In the constantly changing field of data-driven decision making, accurately predicting future events is crucial for strategic planning in various sectors. The emergence of Large Language Models (LLMs) marks a significant advancement in this area, offering advanced tools that utilise extensive text data for prediction. In this industry paper, we introduce a novel method for AI-driven foresight using LLMs. Building on top of previous research, we employ data on current trends and their trajectories for generating forecasts on 15 different topics. Subsequently, we estimate their probabilities via a multi-step approach based on log probabilities. We show we achieve a Brier score of 0.186, meaning a +26% improvement over random chance and a +19% improvement over widely-available AI systems.
- Europe > United Kingdom > England > Greater London > London (0.05)
- North America > United States > New York (0.04)